Neutrophil Analysis

Primary PCA results

The PCA results are actually very similar if you use a CPM cutoff of 1 or 5, so I am sticking with 1 for this analysis, because it will give us loadings for more genes.

source('setup.R')
cpm <- t(t(1e6*counts)/colSums(counts))
filt <- which(rowMeans(cpm) > 1)
pca <- BiocSingular::runPCA(t(log1p(cpm[filt, ])), rank=64)
require(plotly)
plot_ly(data.frame(pca$x), x=~PC1, y=~PC2, z=~PC3, color = pheno$color, colors = sort(unique(pheno$color)))

Here’s a legend for the above plot, since I couldn’t get the legend to work with plotly:

plot.new()
legend("left", pch = 16,
       legend = c('EC','SS', 'SC-A','VS-A', 'SC-B','VS-B', 'SC-C','VS-C'),
       col = c('blue','orange','green','forestgreen','red','firebrick','purple','slateblue4'))

The percentage of variance explained by each principal component. Based on this, I think we should keep 5-12 PCs for downstream analysis, but there’s no way to visualize that.

pctvar <- pca$sdev^2/sum(pca$sdev^2)
plot(pctvar, type= 'b')

Alternate PCA 1: CPM cut-off of 5

I also tried a couple alternative approaches that yielded pretty similar results.

filtAlt <- which(rowMeans(cpm) > 5)
pcaAlt <- BiocSingular::runPCA(t(log1p(cpm[filtAlt, ])), rank=64)
plot_ly(data.frame(pcaAlt$x), x=~PC1, y=~PC2, z=~PC3, color = pheno$color, colors = sort(unique(pheno$color)))

Alternate PCA 2: 10,000 most variable genes

require(DelayedArray)
filtAlt <- which(rowVars(cpm) >= sort(rowVars(cpm), decreasing = TRUE)[10000])
pcaAlt <- BiocSingular::runPCA(t(log1p(cpm[filtAlt, ])), rank=64)
plot_ly(data.frame(pcaAlt$x), x=~PC1, y=~PC2, z=~PC3, color = pheno$color, colors = sort(unique(pheno$color)))

Subset Analysis

Since we don’t have a huge number of samples, I don’t think we can get much out of this type of analysis. I would treat these as purely exploratory. When we do differential expression testing, then it’s fine to use smaller subsets (8 v. 8 is still pretty robust, in that case), but I wouldn’t read too much into the PCA plots.

Subset 1: All controls

ind <- which(pheno$group %in% c('SC','EC'))
subcpm <- cpm[,ind]
filt <- which(rowMeans(subcpm) > 1)
pca <- BiocSingular::runPCA(t(log1p(subcpm[filt, ])), rank=length(ind))
#plot(pca$sdev^2/sum(pca$sdev^2))
plot_ly(data.frame(pca$x), x=~PC1, y=~PC2, z=~PC3, color = pheno$color[ind], colors = sort(unique(pheno$color[ind])))

Subset 2: All post-bypass samples

ind <- which(pheno$timepoint %in% c('B','C'))
subcpm <- cpm[,ind]
filt <- which(rowMeans(subcpm) > 1)
pca <- BiocSingular::runPCA(t(log1p(subcpm[filt, ])), rank=length(ind))
#plot(pca$sdev^2/sum(pca$sdev^2))
plot_ly(data.frame(pca$x), x=~PC1, y=~PC2, z=~PC3, color = pheno$color[ind], colors = sort(unique(pheno$color[ind])))